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Abstract 

We provide a complete characterization of the rate-distortion region for the multistage 
successive refinement of the Wyner-Ziv source coding problem with degraded side infor- 
mations at the decoder. Necessary and sufficient conditions for a source to be successively 
refinable along a distortion vector are subsequently derived. A source-channel separation 
theorem is provided when the descriptions are sent over independent channels for the mul- 
tistage case. Furthermore, we introduce the notion of generalized successive refinability 
with multiple degraded side informations. This notion captures whether progressive en- 
coding to satisfy multiple distortion constraints for different side informations is as good 
as encoding without progressive requirement. Necessary and sufficient conditions for gen- 
eralized successive refinability are given. It is shown that the following two sources are 
generalized successively refinable: (1) the Gaussian source with degraded Gaussian side 
informations, (2) the doubly symmetric binary source when the worse side information is 
a constant. Thus for both cases, the failure of being successively refinable is only due to 
the inherent uncertainty on which side information will occur at the decoder, but not the 
progressive encoding requirement. 



1 Introduction 



The notion of successive refinement of information was introduced by Koshelev [1] and by 
Equitz and Cover [2], whose interest was to determine whether the requirement of encoding a 
source progressively necessitates a higher rate than encoding without the progressive require- 
ment. A source is said to be successively refinable if encoding in multiple stages incurs no 
rate loss as compared with optimal rate-distortion encoding at the separate distortion levels. 
Rimoldi [3] later provided a complete characterization of the rate-distortion region for this 
problem. 

In another seminal paper, Wyner and Ziv [4] characterized the rate-distortion function for 
encoding a source when the decoder alone has access to side information correlated with the 
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source. The notion of successive refinement was combined with the presence of side informa- 
tion by Steinberg and Merhav [5], who formulated the problem of successive refinement with 
degraded side informations at the decoder. The degradedness roughly means that the decoder 
receiving the higher rate bit-stream also has access to the "better quality" side information. 
More formally, this means the source and side-informations arranged in the descending order 
according to the rate of bitstream form a Markov chain. The notion of successive refinabil- 
ity with degraded side informations was consequently defined, which answers the question 
whether such a progressive encoding causes rate loss as compared with a single stage Wyner- 
Ziv coding. In this context, the main result in [5] was the characterization of the rate-distortion 
region and the necessary and sufficient conditions for successive refinability for two-stage sys- 
tems. The characterization for more than two stages was left open. An achievable region was 
indeed given, however, the converse proof was not found 1 . 

In this work we extend these ideas in several ways. First, the question left open by Stein- 
berg and Merhav is resolved, which is the characterization of the rate-distortion region for the 
successive refinement under the Wyner-Ziv setting, for any finite number of degraded side in- 
formations. This is accomplished by an alternative representation of the rate region based on 
rate-sums. This characterization overcomes the difficulty perhaps encountered by Steinberg 
and Merhav, in proving the converse for the general multistage achievable region they found. 
The achievable region provided in [5] is then analyzed and shown to be equivalent to the rate- 
distortion region. Necessary and sufficient conditions for a source to be successively refinable 
are derived. 

The notion of successive refinability introduced by Steinberg and Merhav can be quite re- 
strictive. This can be understood in the context of work of Heegard and Berger [6], as well 
as Kaspi [7], who studied the problem of source coding when a correlated side information 
may or may not be available at the decoder. In particular, it was shown that when transmis- 
sion was to multiple decoders with degraded side informations, the rate distortion function 
could exceed the Wyner-Ziv rate needed for the decoder with the "stronger" side information, 
as well as that needed for the decoder with the "weaker" side information. As such, sources 
can fail to be successively refinable (with side information) simply due to this reason. This 
motivates our definition of generalized successive refinability of sources when decoders have 
access to multiple side informations. In this notion we only require the sum-rate of the pro- 
gressive encoding to match the Heegard-Berger rate for degraded side informations, instead of 
the Wyner-Ziv rate. Necessary and sufficient conditions for a source to have this property are 
then given. This notion of generalized successive refinability is applied to Gaussian sources 
with jointly Gaussian side informations and quadratic distortion measure. It is shown that the 
Gaussian source is actually successively refinable in the generalized sense, though it fails to be 
successively refinable in the strict sense as defined by Steinberg and Merhav in most cases. An 
explicit calculation is also given for the doubly symmetric binary source (DSBS) under Ham- 
ming distortion measure, when the worse side information is a constant, which we show is also 
successively refinable in the generalized sense. The explicit calculation of the rate-distortion 
region for the DSBS source in fact gives the Heegard-Berger rate-distortion function, which 
was not found as of our knowledge despite several attempts [6, 8-10]. 

'in fact, the complete rate-distortion region for multi-stage system with identical side information was given, 
however this only addresses a special case in the framework. 
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Figure 1: A three-stage successive refinement system with side informations. The side infor- 
mations are degraded in the sense that X <-> Y 3 Y 2 Y\. 



The result can be generalized to the scenario when the descriptions are transmitted over N 
independent discrete memory less channel (DMC). In a more recent work [11], Steinberg and 
Merhav showed a source-channel separation result holds for the two-stage case. In light of the 
our new result, it can be shown that such separation holds for the multistage case as well. 

The rest of the paper is organized as follows. In Section E] we define the problem and 
establish the notation. In SectionQJ a characterization is provided for the rate-distortion region 
with an arbitrary finite number of stages, therefore the question left open in [5] is resolved. 
Section |4| begins with the necessary and sufficient conditions for a source to be successive 
refinable, then the notion of generalized successive refinability is introduced and investigated. 
The Gaussian example is explored in Section |3J and the doubly symmetric binary source is 
investigated in0 Section |7] concludes this paper with a brief discussion. Proof details are 
given in the appendices. 



2 Notation and Problem Statement 

Let X be a finite set and let X n be the set of all n- vectors with components in X. Denote 
an arbitrary member of X n as x n = (x 1 ,x 2 , ...,x n ), or alternatively as x when the dimension 
n is clear from the context. Upper case is used for random variables and vectors. A discrete 
memoryless source (DMS) (X, Px) is an infinite sequence {X i ]°l l of independent copies of a 
random variable X in X with a generic distribution Px 

n 

Px(x n ) =Y[P x (xi). (1) 

1=1 

Similarly, let (X, 3^i, 3^2, •••,3 ; iV, PxY-iY 2 ,...,Y N ) be a discrete memoryless multisource with generic 
distribution PxyiY 2 ,...,y n , where iV is the number of coding stages. 
Let A" be a finite reconstruction alphabet, and let 

d : X x X -> [0, oo) (2) 
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be a distortion measure. For simplicity, we will assume the decoders at all the stages use 
the same reconstruction alphabet and have the same distortion measure. The generalization 
to different distortion measures and reconstruction alphabets is quite simple. The per-letter 
distortion of a vector is defined as 

1 n 

d{x,x) = -^d{xi,xi), Vser, xeX n . (3) 
n i=i 

All the log function in this work is taken to be base 2. 

Definition 1 An (n, M l5 M 2 , M N , D%, D 2 , D N ) successive refinement (SR) code for source 
X with side information (Y 1 , Y 2 , Y N ) consists ofN encoding functions <p m , m = 1, 2, N, 
and N decoding functions ip m , m = 1, 2, N: 

<P m : X n ^l Mm (4) 
ip m : I Ml x I M2 x ... x I Mm x^^ X n , (5) 
where Ik = {1,2,..., A;}, such that 

Ed(x n ,ijUMx n ),Mx n ),---AUx n ),YZ)) < An, (6) 

where E is the expectation operation. 

Definition 2 A rate vector R = R 2 , Rn) is said to be D = (Di, D 2 , Dn) achiev- 
able, if for every e > there exists for sufficient large n an (n, Mi, M 2 , M^, D\ + e, D 2 + 
e, .Dat + e) coJe 

R m + e < -logM m , m=l,2,...,JV. (7) 

n 

A three-stage example is given in Fig. [U Denote the collection of all the D achievable rate 
vectors as 11(D), and this is the region to be characterized. When the side informations have 
arbitrary dependence among them, the problem appears to be difficult. As in [5], we consider 
only the case with a particularly ordered degraded side informations, which is given by the 
Markov condition X <-> YJy <-» Y N _i <-» ... <-> 3^. One of our main results is the complete 
characterization of this region, given in the next section. 

We can further consider the case when the descriptions are transmitted over N independent 
discrete memoryless channel (DMC) (see FigEJ. For simplicity, instead of using the more 
general model where the channels are cost-constrained as in [11], we only consider channels 
without constraints; however, such an extension can be done without much difficulty. 

Definition 3 An (n, m, n 2 , un, Di, D 2 , Dn) source-channel SR (SC-SR) code for source 
X with side information (Y\, Y 2 , YJv) for independent channels given by Py c m \x c m > m = 
1, 2, N, consists of N encoding functions (fi m , m = 1, 2, N, and N decoding functions 
m = 1,2, N: 

such that 

Ed(X n ,^ m (Y c , 1 ,Y c , 2 ,...,Y c , 3 ,Y m )) < D m . (10) 
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Figure 2: The corresponding source channel coding problem for the source coding system 
depicted in Fig. [l] . 



Definition 4 A distortion vector D = (Di, D 2 , D^) is said to be SC-SR achievable for 
source PxYiY 2 ,...,Yn aR d channels fy c>m |x c , m . m — 1, 2, iV, under bandwidth expansion fac- 
tor (pi, p 2 , Pn)> if for every e > there exists for sufficient large n an (n, npx, np 2 , np^, D x + 
e, D 2 + e, L>at + e) SC-SR code. The achievable SC-SR distortion region V(pi,p 2 , ■ Pn) is 
the collection of all the SC-SR achievable distortion vectors under the given bandwidth expan- 
sion factors. 



3 The Characterization of the Rate-distortion Region with 
Degraded Side Information 

Define the region 1Z*(D) to be the set of all rate vectors R = (Ri, R 2 , Rn) for which there 
exists N random variables (W\, W 2 , Wn) in finite alphabets Wi, W 2 , Wjv suc h that the 
following condition are satisfied. 

1. (w u w 2 , w N ) <-> x <-> yjv-i <-> ... <-> 

2. There exist deterministic maps f m : >V m x 3^m — * X such that 

Ed(XJ m {W m ,Y m )) <D m , l<m<N. (11) 

3. The alphabet sizes satisfies 

|Wi| < \X\ + 2N-1 

m— 1 

|W m | < \X\ Yl |Wi| + 2JV-2m-l, m = 2,3,...,iV. (12) 

i=l 

4. The non-negative rate vectors satisfies: 

m m 

J2 Ri ^J2 J ( X; w m\wi, ^ y ™)> 1 < m < ^ (i3) 
i=i i=i 

where we have used the convention that W = 0, i.e., the null set. 
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Remark 1 Because of the conditioning on Wi, W 2 , W m -i in the rate expressions, it is 
clear that the function f m (W m , Y m ) can also be written as /^(Wi, W 2 , W m , Y m ) without 
essential difference on the definition of the region. This equivalence will be used in the explicit 
calculation of the rate-distortion region in Section |5] and |6j Furthermore, more structure can 
be built into the random variables W m 's, such that the Markov chain holds as follows W\ <-> 
W 2 Wn <-> X <-» F^v ^ ^v-i ^ ••• ^ ^i; however, such additional structure 

requires an increase in the cardinality of the alphabets (see discussions in [5]). 

The following theorem establishes the rate-distortion region, which is one of the main re- 
sults of the paper. 

Theorem 1 For any discrete memoryless stochastically degraded source X <-^Y N <-^ Y/v-i <-> 

... - V, 

K{D)=K*{D). (14) 

The achievability of the region is quite straightforward. The m-th stage codebook of overall 
size 2 n ^ x ' w ^ Wl ' m '-' Wm -^ +em ^ is generated uniform-randomly from Vfoi Wl Wa W j] 8> 
where T^, Wl W2 Wm _ 1 ] S denotes the set of 5-typical sequences given lower-hierarchy 

codewords (w 1 ,w 2 , ...,WJ m _i). These codewords are then placed into 2 n( - I ( x < Wm \ Wl ' W2 >-> Wm - 1 < Ym ) +2tm ^ 
bins using a uniform distribution. The decoder block-decodes W m in the m-th stage (using 
the side information), which is conditional on the lower hierarchy codewords; since the side 
informations are degraded, each higher hierarchy can always decode the lower-hierarchy code- 
words. From the above interpretation, it is seen that the proof of the achievability of the region 
essentially uses the hierarchy of random codes as in the proof of the two stage case in [5]. Thus 
we will focus on the converse part of the proof of the theorem, which is given in Appendix lAl 
A source-channel separation result is now stated, and the proof is given in Appendix iBl 

Theorem 2 For any discrete memoryless stochastically degraded source X <-» Y N <->• Y^-i ^ 
... Y\, and N independent discrete memoryless channels given by Py c m \x c m , Tn — 1, 2, N, 
the distortion vector D = (D\, D 2 , D n ) is achievable under bandwidth expansion factors 
(pi, p 2 , pn), if and only if there exist random variables (Wi, W 2 , Wn) infinite alphabets 
Wi, W2, Wat satisfying conditions 1), 2), 3) in the definition ofTZ*(D) and furthermore, 

m m 

Pi°i > J2 J ( X; W ^\ W ^ W ^ -» W m-t, Y m ), l<m<N, (15) 
i=i i=i 

where Ci is the channel capacity of channel i. 

The rate region given in Theorem [T] is in a different form than the achievable region given 
in [5]. Here 11* (D) is given in terms of the sum-rate at each stage, including rates at the 
previous stages, the sufficiency of which was formally established in [12]. The achievable 
region in [5], denoted as 1Z*(D) here, involves (N + l)N/2 random variables, and is given in 
terms of individual rate R m at each stage. It is provided below for ease of comparison: 11* (D) 
is defined as the set of all rate vectors R 2 , R^) for which there exists a collection of 
(N + l)N/2 random variables {Vij, 1 < i < N,i < j < N}, where Vij is taking values in a 
finite set Vij, such that the following conditions are satisfied. 
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Figure 3: An example when the achievability of the two regions are equivalent, but the two 
regions are not the same. One region is singleton point labeled using the star, the other region 
is the shaded region including this singleton point. 

1. {Vij, l<i<N,i<j<N}^X^Y N ^ Y N _ X «->...«-> Y x . 

2. There exist deterministic maps f m : V m ,m x y m — ► X such that 

Ed(X, f m {V m , m , Y m )) <D m , l<m<N. (16) 

3. The rate vectors satisfies: 

N 

Ri > I(X; V 1A \Y 1 ) + V 1>k \V 1 , 1 , V 1>2 , V 1>k ^, Y k ) (17) 

k=2 

R m > I(X-, V m:rn \{Vij, 1 < % < m, i < j < m}, Y m ) 

N 

+ J ( X 5 V m,k\i V i,v ^<i<m, i<j <k-l},Y k ), 2<m<N. 

fc=m+l 

(18) 

It is clear that the characterization 71* (D) given in Theorem[T]is more concise. However, 
it can indeed be shown that these two regions are equivalent, and we establish this equivalence 
as a theorem. 

Theorem 3 For any discrete memoryless stochastically degraded source X <-» Yn <->• Y^-i <-> 

7V{D) =H*{p) =K(D). (19) 

The second equality obviously follows from Theorem 1. Theorem 3 is proved in Appendix 
ICl which might be of interest for the following reason. In [5], a proof for a similar but different 
claim was given for the special case of iV = 2, which showed that the achievability of TV '(D) 
and 71* (D) are equivalent. However, this does not directly imply that the two regions are 
equivalent; see Fig. [3] for such an example. In our proof, the fact that 71* (D) = 71(D) is 
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used; and since 71* (D) is an achievable region, we have trivially 71* (D) D it* (D). However, 
without invoking TV(D) = 71(D), it appears difficult to prove this inclusion. Interestingly, 
for N = 2, it is indeed possible to prove Theorem 2 without invoking 7Z*(D) = H(D), and 
this alternative proof is also included in Appendix ICl 

The following observation might shed some light on why a direct proof of TV (D) = 71(D) 
might be difficult, and it also provides the necessary intuition in proving Theorem 3. Consider 
the case N = 3, the random variable 14,3 is tne information that the first stage encoded for the 
third stage. However, if the second stage still has to encode V 2>2 with a nonzero rate, then the 
encoder can not encode V 2)2 conditioned on Vi$, since the second stage decoder will not be able 
to decode V\^. Furthermore V13 does not help in the second stage decoder either. As such the 
encoder might as well encode V\$ after V 2j2 is encoded, which can then be conditioned on V 2 ,2 
to reduce the rate. Thus the optimal scheme is to encode the first stage random variable V\,\\ if 
there is additional bit budget left in the first stage, then adjust and encode Vj. 2 conditioned on 
V\,\ until Vi 2 = ^2,2! and if there is still additional bit budget left, then adjust and encode Vi j3 
conditioned on (Vi 1, V 2}2 ) until = V3 3, etc.; this process carries for each stage sequentially. 
Thus the majority of the N(N + 1)/2 random variables are in fact null random variables, which 
reflect the change of the coding strategy at boundary points. This inherent change of encoding 
strategy appears to pose difficulty in proving the converse using TV(D). 

The example in Fig. |3]can also be explained by introducing the following useful property. 

Property 1 A region K, is said to be sum- incremental, if the following is true: if R G /C, then 
for any non-negative rate vector R' that satisfies Y^Li R'i — Y^hLi f or all 1 < m < N, 
R' G fC. ' 

It was shown in [12] that for successive refinement coding without side information, the 
rate region is sum-incremental. Using the same method, it can be shown that it is also true 
for the rate-distortion region 71(D) of successive refinement coding in the Wyner-Ziv setting. 
Intuitively, this property states that "it does not matter how you divide up the rate between 
layers of the (successively refining) descriptions, as long as the sum-rate of first m layers 
is sufficiently high for each m = 1, 2, iV" [12]: we can simply move the rate in higher 
stages into lower stages to form new codes. The shaded region in Fig. |3]is sum-incremental, 
well the singleton point labeled by the star is not. Thus the shaded region can be a valid rate- 
distortion region for the successive refinement problem, while the singleton point is not, though 
the two regions imply the same achievability result. Now notice that it is quite difficult to prove 
(even if not impossible) TV (D) is sum-incremental, which suggests it will be difficult to prove 
TV(D) = TZ(D) directly. 

4 Strictly and Generalized Successive Refinability 

Extending the definition of successive refinability given in [5] to an iV-stage system, means the 
following. 

Definition 5 A source X is said to be iV-step successively refinable along the distortion vector 
D = (Di, D 2 , D N ), with side informations (Yi, Y 2 , Y N ) if 

(Rr x \ Yl (Di),R*x\Y 2 (D2) - i&mPi), -, R x\Y N {D N ) - R* XIYn i (D n ^)) G TZ(D) (20) 
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where R*x\y(') denotes the Wyner-Ziv rate distortion function for source X with side informa- 
tion Y at the decoder. 

This definition of successive refinability will be referred to as strictly successive refinability, 
for reasons that will become clear shortly. The following theorem provides the conditions for 
iV-stage strictly successive refinability. 

Theorem 4 A discrete memoryless stochastically degraded source X <-» <-> Y^-i *-> 
... <->• Y\ is N-step strictly successively refinable along distortion vector (Z?j, D 2 , D N ), if 
and only if there exist random variables (Wi, W 2 , Wn) and deterministic functions f m : 
W m x 34i — > ^ for m = 1,2..., N such that the following conditions hold: 

1. R* x]Y jD m ) = I(X; W m \Y m ) andEd(X, f m (W m , Y m )) < D m , 1 < m < N; 

2. (Wi, W 2 , W N ) <-> X <-> Y N <-► Y N _ X <-> ... <-> y lf - 
5. W 2 , W m _0 <- (W m , Y m ) ^X,2<m<N; 

4. I{Wi, Y m \Wx, W 2 , Wi- U Yi) = 0, 1 < i < m - 1, 2 < m < N. 

The conditions reduce to the corresponding conditions for the two stage cases in [5]. Note 
that there are in fact a total of N(N — l)/2 equalities specified by condition 4). 
Proof of Theorem^ 

For the necessity, assume (l20b holds. By Theorem 1 , there exists random variables (Wi , W 2 , Wn) 
and maps f m : W m x y m — > Af, such that (Wi, W 2 , M^v) ^ X ^ Y N ^ Y N -\ <->... <-> Yi, 
and since d20b holds, due to (fT3t we have, 

i=l 
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and Ed(X, f m {W m , Y m )) < D m , 1 < m < X. From (ED), it follows that 

m 

R*x\Y m (D m ) > Y,I{X;W l \W u W 2 ,...,W i ^Y l ) 
i=i 

m—1 

( = } [I(X; W m \Wx, W 2 , W m -i,Y m ) + HX Wi\W 1} W 2 , W^Y)} 

i=i 

m—1 m—1 

+[^/(X;W i |W 1 ,W 2 ,...,W i _i,y m )-^/(X;Wi|H^ 1 ,W 2 ,...,Wi_ 1 ,y m )] 



(6) 



1=1 1=1 

m—1 



I(X;W 1 ,W 2 ,...,W m \Y m ) + Y,[ H {W l \W l ,W 2 ,...,W l ^Y i )-H{W l \W l ,W 2 ,...,W i ^Y t ,X) 



i=i 



-H(Wi\W u W 2 , Wi- U Y m ) + H(Wi\WuW 2 , Wi- U Y m ,X)] 



m—1 



( = } I(X; W U W 2 , W m \Y m ) + Y [HiWilWx, W 2 , Wi-lM) - H(Wi\Wi,W 2 , W^Y^) 

i=i 

m—1 

( = } I{X- W 1} W 2 , W m \Y m ) + HWf, Y m \W x , W 2 , W^Yi) 



i=l 

m—1 



I(X; W m \Y m ) + I(X; W U W 2 , W m . x \Y m , W m ) + ^ I (W l ; Y m \W 1} W 2 , W l . 1 ,Y i ) 



i=i 

m—1 



> R* mm {D m ) + Y J I{Wi-,Y m \W l ,W 2 ,...,W i ^Y t ) (23) 



i=i 



> R* x]Y jD m ) (24) 

where (a) is by chain rule and adding and subtracting the same term, (b) follows by combining 
the first and third terms, (c) is due to the Markov chain relationship (Wi, W 2 , Wn) <-> X <->• 
Yn <-> Yn-i Y\\ (d) is also due to the same Markov chain relationship which implies 

we can further condition the last term in (}2"2"|) with Yj. Next, inequality (1231 is due to the 
fact that (W m , Y m ) is sufficient to decode to a distortion D m while at the same time satisfying 
the Markov condition W m <-» X <->• F m . Because the beginning and the end of this chain 
of inequalities are equal, all the inequalities must be equalities. For (l23t . the following two 
conditions must be true 

/(X; W m \Y m ) = R* x{Y jD m ), I(X; W 1: W 2 , W m ^\Y m , W m ) = (25) 

which implies (Wi, W 2 , W m -i) «-> (W m , F m ) <-> X for 2 < m < X. For (EJl, it must be 
true that for 2 < m < N 

I(Wi, Y m \W x , W 2 , W^, YJ = 0, 1 < % < m - 1. (26) 

This establishes the necessity. The sufficiency is of course trivial. The proof is completed. □ 

Remark 2 : Following Remark 1 made after the definition of TV (D) , we note that if the func- 
tion f m (W m , Y m ) is indeed given instead as f' m (W\, W 2 , W m , Y m ), then the third condition 
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in Theorem|4|will not appear in this set of conditions, and the first condition should be modified 

as: R* x]Y jD m ) = I(X; W x , W 2 , W m \Y m ) and Md(XJl n (W 1 , W 2: W m , Y m )) < D m , 
1 < m < N. 

In order to introduce the notion of generalized successive refinability, we note that the prob- 
lem considered in [6], [7] can be understood in the framework being treated as the projection 
of rate vector 71(D) on the sum-rate ^f =1 R% and ignoring the individual rate; i.e., it is a re- 
laxed version of the current problem. Let us denote the sum-rate-distortion function to satisfy 
distortion constraint vector (D 1 , D 2 , D m ) with degraded side information (Y 1 , Y 2 , Y m ) 
as Rhb(Di, D 2 , D m ), which was given in [6]. Since Rhb(D\, D 2 , D m ) degenerates to 
R x , Ym (D m ) when all the other distortion constraints (D±, D 2 , D m _i) are set to be infinite, 
it is seen that Rhb(D\, D 2 , D m ) > R* XYm (D m ). Because Rhb(D\, D 2 , An) is a lower 
bound for the sum-rate of YlT=i R*> ^ Rhb(D\, D 2 , D m ) > R* x , Ym (D m ) for any m G Jjv, 
then the source is trivially not strictly successively refinable. 

From the above discussion, it is seen that for a source to be strictly successively refinable, 
two conditions are necessary. The first is that Rhb(Di, D2, ••■> Dm) = R*x\y C^m)» an d the 
second is that in achieving (D\, D 2 , D m ) for side information (Yi, Y 2 , Y m ), the encoding 
can be performed progressively without rate loss. The first condition in fact provides a simple 
necessary condition to check whether a source is successive refinable without directly testing 
the conditions in Theorem[4j which can be quite difficult because of the involvement of random 
variables W,. 

Theorem 5 A necessary condition for a discrete memoryless stochastically degraded source 
X <-> Yn «-> Y/v-i «-> ... <-> Y\ to be N-step strictly successively refinable along distortion 
vector (Di, D 2 , Djv), is that Rhb(D\, D 2 , D m ) = R x ^ Ym (D m ) for each 1 < m < N. 

This condition is in fact extremely strict, and it is not satisfied for the following two familiar 
sources in the two stage case. 

• The Gaussian source when the two side informations are not statistically identical. This 
example is treated in more detail in the next section. 

• Doubly-symmetric binary source (DSBS) with Hamming distortion measure, when the 
first stage does not have side information. An explicit calculation is given in Section0 

A natural question arises as whether the aforementioned second condition can be satisfied 
separately, and for this purpose the notion of generalized successively refinable with side in- 
formation is defined. This notion can be used to delineate these two conditions which result in 
the failure of a source being successively refinable. 

Definition 6 A source X is said to be A-step generalized successively refinable with degraded 
side informations, i.e., X «-> Y^ «-> ijv-i ^ ••• ^1. along the distortion vector D = 
(D 1 ,D 2 ,...,D N ),if 

(R HB (D 1 ), Rhb(D u D 2 ) - R HB (D l ) } R HB (D U D 2 , D N ) - R HB (D U D 2 , D N _ X )) 

e 71(D). 
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The definition is limited to the degraded side information case, because Rhb(Di, D 2 , D^) 
is known under this condition. The notion of generalized successive refinability only considers 
whether in order to achieve distortion (D%, D 2 , Dn) with side informations (Y%, Y 2 , Yn), 
a progressive encoder is as good as an arbitrary encoder, but ignores whether R x , Ym (Dm) = 
Rhb(D u D 2 , D m ) is true. 

The following theorem makes explicit the connection between strictly successive refinabil- 
ity and the generalized version. 

Theorem 6 A source X is N-step strictly successively refinable with degraded side informa- 
tion along the distortion vector D = (D±, D 2 , D^), if and only if it is N-step generalized 
successively refinable, and Rhb(Di, D 2 , D m ) = R x ^ Ym (D m ) for each 1 < m < N. 

Proof of Theorem^ 

The sufficiency is trivial, and we only prove the necessity. By definition, we have 

r* = {^{D^ir^il^-irxpiiDi),.^ G TZ(D).(21) 

Since r* is achievable, it must satisfy the following lower bound: 

m 

Y J T t> R HB{D ll D 2l ...,D m ), l<m<N. (28) 
Define the rate vector 

r = (R HB (D 1 ), R HB (D U D 2 ) - R HB {p x ), R HB {D X , D 2 , D N ) - R HB (D U D 2 , D N ^)) 

(29) 

then it follows 

m m 

J2 ri = Rhb{D 1 ,D 2} ...,D w ) > R x{Y jD m ) = Y, r *^ RHB(D h D 2 ,...,D m ), 1 < m < N. 

i=i i=i 

(30) 

Thus the inequalities must be equality which gives Rhb(Di, D 2 , D m ) = R x , Ym (D m ) for 
1 < m < N . The sum-incremental property of the rate-distortion region 71(D) further implies 
that r E 71(D), which completes the proof. □ 
The next theorem is also straightforward as a consequence of Theorem 1 and the definition 
of generalized successive refinability, thus the proof is omitted. 

Theorem 7 A discrete memoryless stochastically degraded source X <-> Yat-i ^ 

...«->• Yi is N-step generalized successively refinable if and only if there exist random variables 
(Wi, W 2 , Wn) satisfying the conditions given for TV (D\, D 2 , D N ) with 

m 

R HB (D U D 2 , D m ) = 1(X; W t \W u W 2 , W^, Y t ), l<m<N. (31) 

i=l 
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Different from strictly successive refinability with degraded side information in [5] or the 
conventional successive refinability without side information [2], there is no Markov condition 
involved. Though somewhat surprising at the first sight, it is actually straightforward, because 
for degraded side informations, the optimal coding scheme naturally employs a progressive or- 
der. However, an arbitrary source is not necessarily generalized successively refinable along a 
distortion vector (pair), because a random variable W* optimal for the first stage, is not neces- 
sarily optimal together with any W 2 for the first two stages. An example is that any source that 
is not successively refinable without side information, is not generalized successively refinable 
if we take both the side information Y\ and Y 2 as constant. 

With the definitions above, we will show in the next section that though Gaussian source 
with different but degraded side informations is not strictly successively refinable, it is indeed 
generalized successively refinable. The reason for it to be not strictly successively refinable is 
thus only due to the fact R HB (Dx, D 2l Dj) > R* X < Y . m these cases. Furthermore, we will 
show that the same is true for the DSBS source. Unlike the conventional successive refinability 
without side information, when side information is involved, many familiar sources are very 
likely to be not strictly successively refinable unless the side information is identical at all the 
stages; however, they are quite likely to be generalized successively refinable. 

5 Gaussian Source with Different Side Informations 

We explore the Gaussian source with mean squared error distortion measure in this section. 
The calculation will be focused on the two-stage system, which is sufficient for the purpose of 
illustrating the two kinds of successive refinability; however, it can be generalized to any finite 
stages. We emphasize that this derivation is not a trivial extension of the one in [6] when Y\ 
is a constant, and thus more details are included in Appendix iDl Though all the discussions 
in the previous sections are for discrete sources, the result can be generalized to the Gaussian 
source using the techniques in [13] [14]. 

We first recall the result in [6] for the two stage case, 

Rhb(D u D 2 )= min [I(X; Wx\Yx) + I(X; W 2 \W X , Y 2 )\, (32) 

P(Di,D 2 ) 

where p(Dx, D 2 ) is the set of all random variable (Wx, W 2 ) G Wi x W 2 jointly distributed 
with the generic random variables (X, Yx, Y 2 ), such that the following conditions are satisfied: 
(1) (Wi, W 2 ) <-> X <-> Y 2 «-> Y\ is a Markov string; (2) there exist deterministic functions fx 
and f 2 such that 

Ed{X, f(W 1 , Yx)) < D 1 , Ed(X, f(W 1 , W 2 , Y 2 )) < D 2 . 

The source in question is X ~ jV(0, cr 2 ), i.e., a zero mean normal random variable with 
variance a 2 x . Let Y x = X + N t + N 2 and Y 2 = X + N 2 , where Ni ~ Af(0, erf), N 2 ~ JV(0, of), 
and X, Nx and N 2 are mutually independent and Gaussian; further assume that of , erf > 0. To 
facilitate the discussions, we partition the distortion regions into the following subregions 2 , as 

2 To make the definition of the regions to be consistent with those in [8], we label the horizontal axis as Z?2- 
This convention is also used in the next section. 
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Figure 4: Partition of distortion region for the quadratic Gaussian source, 
illustrated in Fig. |4[ where D\, D* 2 and 7 are defined as 



n * a cr 2 x (al + a 2 2 ) a a 2 x a 2 2 a a\ 

U \ -9 , _2 , _2 ' ^2 o , „2 ' 7 9 , „2 ' 



^ + Cr l+ Cr 2 ^ + ^2 ^f + ^2 

where it is clear that D\ and £>2 are me variance of the best MMSE linear estimator of X given 
Y 1 and Y" 2 > respectively. 

The regions can be understood as follows 

• Region I: < B x < D* x , < D 2 < D* 2 and D x > ^l^^^ - In this region both 
constraints are effective. 

• Region II: Di > D*, < D 2 < D* 2 . In this region, the first stage does not have to 
encode, and the problem degenerates to Wyner-Ziv coding only for the second stage, 
i.e., Ry > and Ry + R 2 > R* X]Y2 (D 2 ). 

• Region III: Di < D\ and < D x < ^rr^%2^ ■ In this region, the second stage does 
not have to encode, and the problem degenerates to Wyner-Ziv coding only for the first 
stage, i.e., R > R* x ^ Yi (Di) and R 2 > 0. 

• Region IV: D\ > DI and D 2 > D 2 . This can be achieved with zero rate, since the 
side-informations are enough to satisfy the distortion constraints. 

Region I is the only non-degenerate case among the four. In fact, for any distortion pairs 
(Di, D 2 ) in Region II, III or IV, there is a distortion pair (D[, D 2 ) on the boundary of Region I 
that strictly improves over (D\, D 2 ), and is achievable using the same rates; i.e., R(Di, D 2 ) = 
R(D' 1 ,D' 2 ), and Di > D[, D 2 > D' 2 , where at least one of inequalities holds strictly. Since 
Region I is the only non-degenerate case, it will be our focus. For the first stage, an obvious 
lower bound is the Wyner-Ziv rate distortion function, which gives 

2 Dxial + af + ai) 
Using Rhb(Di, D2) as the lower bound on the sum rate, we have 

Ri + R 2 > Rhb(D 1 , D 2 ) = - log — - — °l°}° 2 T (34) 

2 D 2 {pl + crf + a|)((l -7) 2 ^i + 7^r) 

14 



for which the rate distortion function Rhb(Di, D 2 ) is proved in Appendix ID1 

Not surprisingly, the following pair of random variables actually achieve the lower bounds 
on R\ and R\ + R 2 simultaneously in Region I: 

W 1 =X + Z 1 + Z 2 , w 2 = x + z 2 

where Z\, Z 2 are mutually independent zero-mean Gaussian random variable, and independent 
of (X, Ni, N 2 ), with proper choice of variances determined by D\, D 2 , erf, of, a^. Alterna- 
tively, it is obvious that this choice of W\ and W 2 makes all the inequalities in the lower 
bounding derivation satisfied with equality, thus achieves the lower bound. 

From the above discussion, it is clear that this choice of W\ and W 2 satisfies the condition 
of Theorem|71 and thus Gaussian source is indeed generalized successively refinable. However, 
in the interior of Region I, Rhb(Di, D 2 ) is strictly larger than R x ,y 2 (D 2 ), which implies Gaus- 
sian source is not successively refinable in the strict sense for these distortion pairs by Theorem 
|6j On the boundary between Region I and II, as well in Region II, R HB (Di, D 2 ) = R* X \ Y2 (-^2), 
thus it is indeed successively refinable in the strict sense for these distortion pairs; however, this 
degenerate case is less interesting. 



6 The Doubly-symmetric Binary Source 

In this section we consider the following special case: X is a DMS with alphabet in {0, 1}, and 
P(X = 0) = P(X = 1) = 0.5. Side information Y 2 = Y = X © N, where N is a Bernoulli 
random variable independent of everything else with P(N = 1) = p < 0.5 and © stands for 
modulo 2 addition; alternatively, Y can be taken as the output of a binary symmetric channel 
with input X, and crossover probability p. Y\ is a constant, i.e., there is no side information at 
the first stage. The distortion measure is the Hamming distortion d(x, x) = x © x, where © is 
modulo 2 summation. 

As in the Gaussian case, the function Rhb(Di, D-i) plays a significant role for this source. 
We digress here to give a brief review of this particular problem. The DSBS source, which 
is probably the simplest discrete source in the side information scenario, provided consider- 
able insight into the Wyner-Ziv problem [4]. Somewhat surprisingly, an explicit calculation 
of Rhb(Di, D 2 ) was not found for this source. Heegard and Berger postulated a forward test 
channel in [6], which was later shown to be not optimal by Kerpez [8]. Kerpez provided upper 
and lower bounds, neither of which are tight. Fleming and Effros [9] also contributed to this 
problem by considering it as a rate distortion problem with mixed types of side information. 
An algorithm to compute the rate-distortion function numerically was further devised in [10]. 
However an explicit expression of the rate distortion function for this source, and more im- 
portantly the corresponding optimal forward test channel structure have not been given in the 
literature. In the process of considering our problem for the DSBS case, we give an explicit 
solution to the Heegard-Berger problem as well. 

In this section we first explicitly calculate Rhb(Di, D 2 ), and then apply the result to the 
successive refinement coding case, where it will be shown that the DSBS is indeed generalized 
successively refinable. 
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Figure 5: The four parts of the rate-distortion regions. d c is the critical distortion defined in [4] 
6.1 R H b{Di , D 2 ) for the DSBS source 

As in the Gaussian case considered in Section it was shown in [8] 3 that the rate distortion 
region can be partitioned into four subregions, three of which are degenerate (see Fig. 13). 

• Region I: < Z?i < 0.5 and < D 2 < mia(Di,p). In this region R(Di,D 2 ) is a 
function of both D\ and D 2 , and it is the only non-degenerate case; 

• Region II: D 1 > 0.5 and < D 2 < p. Here the first stage does not have to encode and 
therefore the problem degenerates to Wyner-Ziv encoding for the second stage. 

• Region III: < D\ < 0.5 and D 2 > min(D 1 ,p). Here the second stage does not have 
to encode and hence the problem degenerates to the rate-distortion encoding for the first 
stage. 

• Region IV: D x > 0.5 and D 2 > p. Clearly the rate is zero since the distortion constraints 
are trivially met. 

We will need the following function from [4], defined on the domain < u < 1, 

G(u) = h{p * u) — h(u), 

where h(u) is the binary entropy function h{u) = —u log u — (1 — u) log(l — u) and u * v is the 
binary convolution for < u,v < 1 andw*t> = u(l— v)+v(l— u). We will be interested only in 
the case < p < 0.5. It was shown in [4] that G(u) is (strictly) convex; furthermore, it is easy 
to show that G(u) is symmetric about 0.5, and is monotonically decreasing for < u < 0.5; 
the minimum of G(u) is zero when u = 0.5. It was also shown 4 in [4] that for < D < p 

R* xlY (D) = min [0G(S)\. (35) 

(0,ey.o<e<i,o<i3<p,D=ep+(i-e)p 



3 Note that the constraints Di and D 2 , which are the first and second stage distortions here, correspond to D 2 
and Di defined in [8] respectively. 

4 In [4], the minimization was given instead as an infimum with the feasible range of < j3' < p, but it can be 
shown that for D 2 < p, these two forms are equivalent. 
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We next define the following function 

S Dl (a, (3, 9, 6t) = 1 - h(Dt *p) + (9- 9 1 )G(a) + 9\G{0) + (1 - 0)G( 7 ) 

where 

r D 1 -{e-e 1 ){i-oi)-e^ Q -L \ 

7 \ 0.5 = 1 

on the domain 

O<0i<0<1, 0<a,/?<p, p<7<l-p. 

Notice that So 1 (-) is continuous at = 1. 

The following theorem characterizes the rate distortion function R HB (D 1 , D 2 ) in Region I. 

Theorem 8 For distortion pairs (Di, D 2 ) in Region I: 

R HB (D U D 2 ) = mm S Dl (a,f3,6,e 1 ) = S*(D 1 ,D 2 ), 

where the minimization is over the domain of Sd 1 (a, (3, 9, 0i), subject to the constraint 

(9-9 1 )a + 9 1 (3 + (l-9)p = D 2 . 

This theorem is proved in Appendix |E| One notable consequence in the proof of the forward 
part of this theorem, is that W\ can always be taken as the output of a BSC with crossover 
probability D\ and input X. This observation is important to determine whether this source is 
generalized successively refinable. 

The following two corollaries are useful, and are straightforward given Theorem HI which 
are also proved in Appendix[E] The first corollary provides a lower bound for R H b(Di, D 2 ), 
which is easy to compute and usually tighter than the one given in [8]. 

Corollary 1 For distortion pairs (Di, D 2 ) in Region I: 

R HB (D U D 2 )>1- h{D x * p) + R* X{Y (D 2 ). 

Next recall the definition of the critical distortion d c in the Wyner-Ziv problem for the 
DSBS source, where 

^ = G'(d c ). 
d c -p 

We have the following corollary which specifies a simple forward test channel structure for the 
case D 2 < d c . 

Corollary 2 For distortion pairs (D\,D 2 ) such that D\ < 0.5 and D 2 < mm(d c , Di) (i.e., 
Region FB), 

Rhb(D u D 2 ) = l- h(D 1 * p) + G(D 2 ). 

From the proof of Corollary|2l it is seen that the optimal forward test channel for this case is in 
fact a cascade of two BSC channels depicted in Fig. |6j 



17 



Y 


BSC 


X 


BSC 


W 2 


BSC 


% 











Figure 6: The optimal forward test channel in Region I-B. The crossover probability for the 
BSC between X and W 2 is D 2 , while the crossover probability 77 for the BSC between W 2 and 
Wi is such that D 2 * r\ = D\. 

6.2 Successive Refinability for the DSBS Source 

From CorollaryUl it is evident that R HB (D 1 , D 2 ) > R* x \y (-^2) unless D\ = 0.5, which implies 
that the DSBS is not strictly successively refinable; however, it is generalized successively 
refinable. This is true because Theorem [8] and its proof imply that W\ can always be taken as 
the output of a BSC with crossover probability of D 2 and input X. This W\ and the optimal W 2 
clearly satisfy the condition in Theorem 13 thus the DSBS is indeed generalized successively 
refinable. 

7 Conclusion 

We provided a characterization of the rate-distortion region for the multistage successive re- 
finement of Wyner-Ziv problem with degraded side information, which was left open in [5]. A 
systematical comparison with the achievable region given in [5] was provided, and the equiva- 
lence is established precisely. We also established a source-channel separation theorem when 
descriptions are transmitted over independent channels. Conditions for (strictly) successively 
refinable are accordingly derived. The notion of generalized successively refinable was intro- 
duced, in order to delineate the two obvious factors which result in the failure of a source being 
successively refinable. We showed that the Gaussian source with multiple side informations, as 
well as the doubly symmetric binary source when the first stage does not have side information, 
are in fact generalized successively refinable, but not strictly successively refinable. As such, 
their being not successively refinable is only due to the uncertainty on which side information 
will occur, but not the progressive encoding requirement. 

A Proof of the Converse of Theorem 1 

There are a total of iV rate constraint inequalities. We consider bounding the rate sum YlT=i 
for a given m, where 1 < m < N. Assume the existence of (n, Mi, M 2 , M/v, Di, D 2 , D^) 
SR code, there exist encoding and decoding functions <fii and for 1 < i < N . Denote 4>i(X n ) 
as Tj. We will use the notation T- to denote the vector (Tj, T i+ i, ...,Tj) when i < j; if i > j, 
we take the convention that T- is the empty set 0. (X\, X 2 , ...,X n ) will be denoted as X and 
(Yj i, Yj2, Yj n ) as Yj. X k will be used to denote the vector X 2 , X k _i) and X~l to 
denote X k+2 , X n ). For a collection of side informations, denote ((Yj)jJ', (Y i+ i) k , (Yj 

as (Y^)k , and similarly for (Y?) k ; they will be combined when necessary and denoted as 
(Y?)^. The subscript k will be dropped when it is obvious from the context. (Y^)fc is un- 
derstood as the vector (Y ijk , Y i+ljk , Y j:k ). We will assume m > 2 such that the quantities 
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exist in the following proof, but it is straightforward to verify for m = 1,2, that the derivation 
degenerates in the correct way. 

The following chain of inequalities is straightforward 

m 

n 



Y,Ri > H(T" 



> H(T?\Yt) ( = ] H(T?\Yi) - H{T™\Yi, X) 

= I(X;T™\Yi) (36) 

m 

= ItX-T^Y™^) -^ItX-Y^Y?- 1 ) (37) 

i=2 

n m 
k=l j=2 

where (a) is because the index is a function of the source, and the last two equalities follow 
from the chain rule for mutual information. Define the term in the outer summation of (l38l) as 
T k , i.e., 

m 

T k = UX h ;1?Yr\Y x Xt) -J^HX^lT^Yt^Y^) (39) 

i=2 

For simplicity, from here on we will drop the subscript k when we refer to the sequences, e.g., 
we will denote X^ by X and (Yj) k by Yf. We will work primarily with Tk until the very 
end of the proof. For the first term in T k 

I(X k ; T?Y™\YiX~) ( = } I(X k ; T™Y™Y±X~\Y x , k ) > I(X k ; T^Y^Y^\Y 1Js ) (40) 

where (a) follows from the fact that (X k , Y 1>k ) is independent of (X~, Y-^ 1 ). Because of the 
Markov string Y j:k <-> (X k , (F/ _1 ) fc ) <-> (T{ n X ± (Yf~ 1 ) ± Y j ~), for each term in the negative 
summation in T k , we have 

I{X- Y jjk \T?Y*- x Y7) = I{X k ; Yj^Y^Yr) (41) 



Combining (0OJ) and (flTl). it follows 



rn 



T k > IiX^T^Y^Y^^-J^H^Y^lT^^Yr) (42) 

3=2 

Applying the chain rule for the positive term in the right hand side of (|42l . we have 

I(X k ; T^Y^Y^Y^) = I(X k ; T^Y^Yf\Y lik ) + I(X k ; Y^Y+Y^T^Yf) (43) 
For the second term in Eqn. d43t . we have 

I(X k ; Y 2:k Y+Y 3 m \Tl n Y 1 Y 2 -) = I(X k ; Y 2jk \T^Y 1 Y^) + I(X k ; Y 2 + Y™ \ T™ Yi Y 2 Y 2 , k ) 
— I(X k ; Y 2k \T^ n Y 1 Y 2 ) + I(X k ;Y 2 + Y 3 \T^ n Y 1 Y 2 Y 2jk ) + I(X k ; Y 3k Y 3 + YJ n \Tl rl Y^Y 3 ) 
= I(X k ; Y 2 , k \T™Y 1 Y 2 -) + I(X k ; Y+Y^T^Y^Y^) 

+ I(X k ; Y^ k \T™Y?Y 3 ) + I(X k ; Y+Y^T^YfY^). (44) 
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Continuing this decomposition, it finally gives 

m 

I(X k ; Y 2ik Y+Y™\T?Y x Yf) = £ I(X k ; Y jjk \TJ n Yf~ 1 Y j ~) 

3=2 

m—l 

+ ^Xk, Y^Yr^Yi^YfY^) + I{X k - Y+lTTYr-'Y-Y^). (45) 

3=2 

Substituting this in d43t . we get 

m 

I(X k ; T™Y™Y±\Y hk ) = I(X k ; T?Y±Y*\Y Xik ) + £ ^I^W" 1 ^") 

3=2 

m—l 

+ £ J(X fc ; Y^Y-^Yj^YrY^) + J(X fc ; ^+|T 1 m ^r _1 ^-^ fc )- 

(46) 

Therefore, substituting (I46T) into (l42l) we see that the negative term in (l42l) cancels out the 
second term on the RHS of (|46T) . which gives 

V k >I{X k ',T! n Y x ± Yf\Y 1 j k ) 

m—l 

+ £ I(X k ; Y+Y-^TrYr'YrY^) + I(X k ; Y+\T™Y?- l Y-Y m , k ){41) 

3=2 

For the first term in (l47b . we have 

J(X fc ; T^y^y-IY^) = /(X fe ; T^Y^) + I(X k ; T^Y-^Y^. (48) 
We claim that 

I(X k ; 1™Yi\T x Yi) > I(X k ; T^Y 2 ~ [T^Y^) (49) 

and more generally for 2 < j < m 

I(X k ; T^Y-^Yt 1 ) > I(X k ; T^Yf^Yt^) (50) 

which can be justified as follows 

IiX^TpYflTr'Yt 1 ) ~ IiX^TpYflTr'Yt 1 ^,,) 
= HiXklTt'Yt 1 ) ~ H(X k \ T™ Yl ~ 1 Yj~ ) 

-HiX^Tl^Yj-^k) + HiX^Y^YrY^) 
= I(X k ; Y^Ti^Yt 1 ) - I(X k ; Y jtk \ T™Yr Y(~ x ) 

= HiY^TrYt 1 ) - HiY^X^rYt 1 ) 

-HiY^TrYfYt 1 ) + HiY^X^Y-Yt 1 ) 

( = } I(Y jtll ; i T?Y j -\T(- x Y*- X ) >0 (51) 
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where (a) is due to the Markov condition Y j>k «-> (X k , {Y(' l ) k ) «-> (T 1 m l^~(l^' _1 ) ± X ± ) 
implies the reduced Markov condition Yj >k (Jf fc , (Y/~ ) fc ) <-» (T^T^ — (Vf 1- ) ± ). Assume 
for now m > 2, and consider the following summation of the second term in d48t and the 
second term in d47h 

m— 1 

i=2 

(a) m—l 

> J(X fc ; ^-1^^^,) + ^ I(X k ; Y+Y^T^Y^YrY^) 

i=2 

= J(X fc ; T^Y 2 ~\T 1 Y 1 Y 2jk ) + I(X k ; Y^Yf^Y^'Y^) 

m—l 

+ £ HXk, Y+Yr^Yi^YfY^) 

3=3 

m—l 

i I(X k ; T^Y^T^Y^) + £ J(X fe ; y/^Trr/'- 1 ^. fc ) 

i=3 

m—l 

= J(x fc; t^It^y^) + /(x fe ; r 3 -r 3 -|r 1 2 F 1 2 ) + ^ /(X fc ; Y+Y^T^Y^YrY^), 

(52) 

where (a) follows because of (l50l) and (6) follows due to chain rule. Notice for the second term 
in (l52l) . we can again apply inequality (l50l) . and continue sequentially along this way, which 
finally gives 

m—l 

J(X fe ; T^Y-l^Y,) + £ J(X fc ; Y+I^- ^Yj^YfY^) 

m—l 

> £ I(X k ; T.Y^Tl-'Yt^) + I(X k] T m Y~\Tr 'Y?' 1 ) (53) 
Combining (ETJ), (gHJ) and d23> gives 

m—l 

T k > I(X k ; T x Y^\Y ltk ) + T (X k ; TjY^Tl^Yt^) 

i=2 

+I(X k ; TnY-lT^Yf 1 - 1 ) + I(X k ; Y+^Y^Y-Y^) (54) 

m 

> ^/(X^T.r/lTr 1 ^- 1 ^). (55) 

i=i 

where inequality d50b is applied on the third term in (l54b . It is straightforward to verify that 
inequality (l55l) is still valid if m = 1 or m = 2, when the proper convention of empty set is 
taken. 
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In d55t . the conditioning on (Y{~ ) k has to be removed to reach the desired form, which 
can indeed be done due to the degradedness of the side informations. More precisely, for 

2 < j < m 

I(X k] T^lTt'Yt 1 ^) - I(X k ; T^/lTf- 1 W 1 )^) 
= H(X k \Tt x Yt X Y jtk ) - H{X k \TiYi) - (X^TrW'- 1 )^) + # M^W)^) 
= -I(X k ; (r/- 1 ) fc |Tr 1 (Y/- 1 ) ± F J , fe ) + I(X k ; (XtWniY^)^) = (56) 

where in fact both the terms in the d56T) are zero, due to the Markov condition (Y/ ~ ) k <-> 
Y jJk <-> (XT 1 m (^ 1 m ) ± ) implies the reduced Markov condition {Yl~ l ) k <-> Y/, fc <-> (XfcT^Y"/)*). 
Thus we reach the form 

r fc > ^/(x^T^/irr 1 ^'- 1 )^) = ^/(x fc ;T 1 ^ i ± |rr 1 W- 1 ) ± ^ fc ).(57) 

Define fc = (T{, {Y j)h) and b y substituting <|571) into OHJ we nave for 1 < m < iV, 

m n m 

> SS^^^IW" 1 )*'^.*) (58) 
i=l fc=l j=l 

Therefore the Markov condition (Wi,*, Wjv,fc) <-> X& <-> Y^.fc ^ Yjv-i,& ^ ••• 

Yi tk is true. Next introduce the time sharing random variable Q, which is independent of the 
multisource, and uniformly distributed over I n . Define Wj = {Wj t Q,Q). The existence of 
function fj follows by defining 

fj(Wj, Yj) = ^, Q (MX), MX), MX), Yj) (59) 
because Wj includes T(Yj ± , which leads to the fulfillment of the distortion constraint 

1 ™ 

Ed(XJj(Wj,Yj)) = -J2^d(Xi,^, i (MX),MX),...,MX),Y j )) < Dj, l<j<N 

i=l 

(60) 

and the Markov condition (W\, W 2 , Wn) <-> X <-> Yjv «-> Y/v-i ■■■ Y\ is still true. 
It only remains to show the bound (l58t can be writen in single letter form in Wj, but this is 
straightforward following the approach on pg. 435 of [15] (see also [5]). The bounds on the 
alphabet size is by applying conventional argument (see [16]). This completes the proof. □ 



B Proof of Theorem 2 

The forward part is trivially implied by Theorem 1 and the conventional channel coding theo- 
rem, and thus we only give an outline of the converse part. 
By Lemma 8.9.2 in [15], we have 



n 



5>a>£/pQ;^) (6i) 
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where n; L = npi, and pi is the number of channel use per source symbol for the i-th channel. 
Notice that 



(a) 
(b} 

(c) 



T(Y ni Y n2 yim.y"iy»2 vn m \ 
1 l^c,l^e,2> •") ^c^mi 1 c,\ 1 c,2i ■■■■> 1 c,m I 

77 y ni y™ 2 yn m . I 7Y yni x^n 2 yiim.yn2yi3 v ra ™|V ni "\ 

J ^c,l^-c,2) •••5 A c,mi 2 c,l J "r J ^c,l^-c,2) •••> A c ; mi 1 c,2 2 c,3 > 2 c,m Kc,l J 

J l^c.li 1 c,l ) t J ^ Cj l^ c ,2) -) A c,mi 1 c,2 2 c,3 i "•> z c,m I 2 c,l J 

7Y yii . V niN \ J_ H7V n2 V n 3 V™ m IV ni ^_ f77V n2 V n 3 V n »™ I V™ 1 Y"! y n 2 yn m \ 
J ^c,l) 1 c,l) ' c,2 c,3 ' •") c,m I c,l / c,2 2 c,3 ' ■"' 2 c,m I 2 c,l ^0,1^-0,2) •••) A c,mJ 

J ^0,1) I c,l J ' n \ I c,2 I c,3 > ■■■■> I c,m \ I c,l J n K 1 c,2 1 c,3 > "•> J c,ra 1^0,2) •••) -^cmJ 
(d) 

r/ yni . v ni "\ _L WYV n2 V n3 vn m \ _ tt (\rn2\rn0, y»m I y«2 yn m \ 

— 1 ^c,l) 2 c,l / ' 21 l-'c^ I c,3 i ■■■■> I c,m) 12 l 2 c,2 z c,3 ) ••) 2 c,m l^c,2) ■") A c,mi 

= /TOj^ + ZTO^,...,!^;^,...,^) (62) 
where (a) is by chain rule, and (b) and (c) are because the channels are independent, i.e., 

PY Ctl Y Ci2 ,...,Yc, m \X Ctl X Cj 2,...,X c , m = PY C)1 \X Ctl PY Ct2 \Xc,2---PY c , m \X c , m (63) 

which implies the Markov conditions {X c j}j^i <-> X c i <-> Y c>i and {Y c j}j^i <-> {X c <-> 
(X C j, (d) is because conditioning reduces entropy. 

Continue this decomposition and combine it with doTT) . we have 



i=l i=l 
(a) 



> J(^;Wl3,..,y, B , 



(6) 



A A r i . Y c,l r c,2)-) r c,mJ 
rcyn.y«iy»2 yn m \ < r / yn. yiy 1 ^ v n ™IVM 

— V 1 ? C,l C,2 ? ** *J C,771 / ' I ! I C,l i C,2l-! I C,ml I l i 

> /(X™;^^,...,^!^) (64) 

where (a) is due to data processing inequality, and (b) because the Markov chain Y\ <-> X <->• 
Y" Cj i. At this point the similarity between d64l and <T36t is quite clear. Using the same steps as 
in the derivation as in the proof of Theorem 1 , the converse of Theorem 2 is proved. □ 



C Proof of Theorem 3 

We first prove for the special case N = 2 without invoking Theorem 1 that 1Z*(D) = TZ(D). 
The proof of Theorem 3 then follows from invoking Theorem 1 for one direction and extending 
the proof of N = 2 for the other direction. 
Proof for the case ofN = 2 

We first prove that TZ 2 {D) C TZ 2 (D), where the subscript 2 stands for N = 2. For an 
arbitrary rate pair (ri, r 2 ) G 1Z* 2 (Di, D 2 ), there exist 3 random variables V^i, Vi >2 and V 2>2 , and 
the corresponding functions fi(Vi t i,Yi) and f 2 (V 2t2 , Y2), such that 

ri > J(X;V r 1 , 1 |F 1 ) + /(X;V r li2 |V r 1 , 1 ,F 2 ) (65) 
r 2 > I(X;V 2>2 \V hl ,V 1>2 ,Y 2 ) (66) 



23 



and the distortion constraints are satisfied. Inequalities d65l) and (l66l) imply that 

ri > /(Xj^xln) 

ri + r 2 > /(X;y lil |y 1 )+/(x ; y 1)2 |y lil ,y 2 ) + /(x ; y 2j2 |y 1)1 ,y li2 ,y 2 ) 

= I(X; Vx^YCj + /(X; V 1>2 , ^V^, Y 2 ) 
Now define Wi = V^i and VF 2 = (Vi,i, Vi )2 ), and it follows that 

n > /(Xj^in) 

n + r 2 > I(X;W 1 \Y 1 ) + I(X;W 2 \W 1 ,Y 2 ) 

and (Wi, W2) is a pair of random variables satisfying the condition for 1Z 2 (Di, D 2 ) and thus 
(r 1; r 2 ) e (-Di, £> 2 ), which shows that TZ* 2 (D) C TZ* 2 (D) since trivially the distortion con- 
straints are also met. 

Toprove the other direction, i.e., TZ* 2 {D 1 ,D 2 ) D 7?|(7Ji, 7J 2 ), assume (ri, r 2 ) e TV 2 (Di,D 2 ). 
There exist random variables Wi and VF 2 , and two corresponding functions and 
/ 2 (W 2 ,y 2 ), such that 

r a > /(X;^!^) (67) 
n + r 2 > J(X;iy 1 |r 1 ) + /(X;^ 2 |^ 1 ,F 2 ) (68) 

and the distortion constraints are met. Let Ari = r x — I(X; Wi\Yi). We claim that for any 
< Ari < /(X; H^ 2 |Wi, F 2 ), there exists a random variable V, such that 

An = I(X;V\W U Y 2 ) (69) 
/(X; y 2 ) + /(X; W 2 \W U V, Y 2 ) = /(X; W 2 \W U Y 2 ). (70) 

There are many ways to construct V, for example we can construct V = (W 2 (J), J), where 
J is a Bernoulli random variable independent of everything else with p(J = 1) = u; when 
J — 1, W 2 ( J) = W 2 and W 2 ( J) is a fixed constant otherwise; /(X; V|Wi, Y 2 ) can be any real 
value in the interval [0, /(X; W2I Wi, Y 2 )] by choosing u appropriately. For a more thorough 
treatment on this topic in the context of rate splitting in multiple access channel, see [17]. It 
follows that for this case 

n = I(X;W 1 \Y 1 ) + I(X;V\W 1 ,Y 2 ) (71) 
r 2 > I(X;W 1 \Y 1 ) + I(X;W 2 \W 1 ,Y 2 )-r 1 

= I{X;W 2 \W X ,V,Y). (72) 

Now define V Xt i = W\, Vi j2 = V and V 2)2 = W 2 . The random variables (Vi t i,Vi j2 ,V 2j2 ) 
clearly satisfy the definition given for H*(Di, D 2 ), and thus (ri,r 2 ) G H*(Di,D 2 ) for this 
case. On the other hand, if Ari > /(X; W 2 |Wi, Y 2 ), then defines V lt i = W x , V 1>2 = W 2 and 
V 22 = W 2 . The non-negativity condition r 2 > implies r 2 > I(X;V 2>2 \Vi t i,Vi >2 ,Y 2 ). Since 
the reconstruction functions fi(W 1 ,Y 1 ) = and f 2 (W 2 ,Y 2 ) = f 2 (V 2)2 ,Y 2 ) satisfy 

the distortion constraints, the proof is completed. □ 
Proof of Theorem 3 

Since n*(D) is an achievable region, we have trivially K*(D) C 11(D) = K*(D) due to 
Theorem 1 . For the inclusion of the other direction, the proof for the case N = 2 can clearly 
be extended straightforwardly, by sequentially constructing random variable corresponding to 
{Vij},j > i. This completes the proof for Theorem 3. □ 
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D Lower Bound on the Sum-rate for the Gaussian Source 



To lower bound the sum-rate to achieve (D x , D 2 ) with side information (Y x , Y 2 ), consider the 
following quantity, 

I(X;W X \Y X )+I(X;W 2 \W X ,Y 2 ) 
= H{X\Y X ) - H(X\W X , Y x ) + H(X\W X , Y 2 ) - H{X\W U W 2 , Y 2 ) 

( = } H(X\Y X )-H(X\W X ,W 2 ,Y 2 )-I(X;Y 2 \W X ,Y X ) (73) 

( = } H(X\Y X )-H(X\W X ,W 2 ,Y 2 )-H(Y 2 \W X ,Y X ) + H(Y 2 \X,Y X ) (74) 

where we can see (a) follows since I(W X ,X; Y X \Y 2 ) = I(W X ; Y X \Y 2 ) + /(X; Y X \W X , Y 2 ) = 
due to the Markov condition W x <-> X <-> Y 2 <-> Y x , which implies that /(X; Y 2 ) = 

H(X\W X ,Y 2 )-H(X\W X ,Y 2 ,Y X ) = 0. In an identical manner (b) is due to, I(W X ; Y 2 \X } Y x ) = 
H(Y 2 \X,Y X ) - H(Y 2 \X,Y X ,W X ) = 0. The quantities H(X\Y X ) and H(Y 2 \X,Y X ) are only 
dependent on the multi-source. We bound the second term in d74l as follows 

H{X\W X ,W 2 ,Y 2 ) = H(X-E{X\W X ,W 2 ,Y 2 )\W X ,W 2 ,Y 2 ) 

< H{X-E(X\W X ,W 2 ,Y 2 )) 

< H(Af(0,E(X -E(X|^ 1 ,^ 2 ,F 2 )) 2 )) (75) 

< Uog(2neD 2 ) (76) 

where in (l75t we use the fact that normal distribution maximizes the entropy for a given second 
moment, and in (EHl) the fact that the variance of E(X - E(X\W X , W 2 , Y 2 )) 2 < D 2 because of 
the existence of function f 2 (W x , W 2 , Y 2 ) to reconstruct X with distortion D 2 . 
To bound the third term in d74"l) . write Y 2 = X + N 2 as follows 

2 2 

X + N 2 = X + N 2 + f 2 J N X + N 2 )- ° 2 J N X + N 2 ) 

af + cr| erf + ai 

2 2 2 

= 72T^(* + Ni + N 2 ) + -£-3X + [N 2 - -^- 2 {N X + N 2 )} 

= 7 r 1 + (i- 7 )x + [(i- 7 )iv 2 -7iv 1 ], 

2 

where 7 = as in Section|5l It can be seen that [(1 — ^)N 2 — ^N x ] is independent of Y x , 

by checking the fact E(li[(l — j)N 2 — 'jNi}) = and recalling that they are jointly zero-mean 
Gaussian. Further notice X is independent of (N x , N 2 ), which implies [(1 — 7 )X 2 — 7X1] is 
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also independent of W\. Thus we have 



H(Y 2 \W 1 ,Y 1 ) = HfrY 1 + (l-'y)X + [(l->y)N2-'yN 1 ]\W h Y 1 ) 

= H((l--y)X+[(l->y)N 3 -'yN 1 ]\W 1 ,Y 1 ) 

= H((l -i)[X- E(X|Wi, Ki)] + [(1 - l)N 2 - 7 iVi]|Wi, n) 

< - 7 )[X - E(X|W 1; Kx)] + [(1 - 7 )iV 2 - 7^1]) 

< H(M(0, E{(1 - 7 )[X - E(X|Wi, FO] + [(1 - 7 )iV 2 - 7 iVi]} 2 ))(77) 

< /J r (Ar(0,(l-7) 2 J Di + (l-7)V 2 2 + 7 V 1 2 )) (78) 

= ~ log[27re((l - 7)^1 + (1 - 7 ) 2 ^ 2 + 7^?))] 

= ilog[27r e ((l-7) 2 J D 1 + 7( T 2 ))] (79) 

where in (l78|> . we used the fact that [X - E(X\ Wi, Yi)] is independent of [(1 - j)N 2 - jNi], 
Using (EH) and $79^ in (EU) gives 

i 2 2 2 

Ri + R2>7;log a l°^ 2 (80) 

Note that this lower bound is only tight and achievable when both Di and D 2 are effective, i.e., 
in Region I. When D 2 is not effective, the bound that 

« I + fl 2 >« I >ilog( 1T #i±^L-> 



is in fact achievable with equality. By comparing the above two bounds, it can be seen that 
this corresponds to the condition D 2 < (^yf^^a or equivalently D\ > ^rr^r^p^ when 
D 2 <D* 2 . '' □ 



E Proof of the Theorem and Corollaries for the DSBS 

E.l Proof of Theorem i 

We will need the following lemma from [8] to simplify the calculation. 
Lemma 1 For (W u W 2 ) e p(D u D 2 ) 

I(X; W x ) + /(X; W 2 \YWx) = H(X) - H(Y\Wi) + H{Y\W X W 2 ) - H{X\WiW 2 ). (81) 
The lower bound 

Let (W\, W 2 ) e P(Di, D 2 ) define a joint distribution with (X, Y). Furthermore, assume 
the functions fi and f 2 are optimal for these random variables, i.e., there do not exist f[ (or f 2 ), 
such that E^X,/^)) < Ed(X, /i(Wi)) (orEd(X, f 2 (Wi, W 2 , Y)) < Ed(XJ 2 (W u W 2 ,Y))), 
because otherwise we can consider the alternative functions f{ (or f 2 ) without loss of optimal- 
ly. Our goal is to show that /(X; W x ) + /(X; W 2 fKWi) > S*(D 1 , D 2 ), then invoke the rate 
distortion theorem, by which the lower bound can be established. 
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Similar as in [4] [8], define the following set 

A = {(w 1 ,w 2 ) : / 2 (wi, w 2 , 0) = f 2 (w 1 , w 2 , 1)}, (82) 

which defines its complement as, 

A c = Wi x W 2 - A = {(w u w 2 ) : f 2 (w u w 2 ,0) ^ f 2 (w u w 2 , 1)}. (83) 

For each wi G Wi, define the following two sets 

B(w 1 ) = {w 2 eW 2 : (wi,w 2 ) e A, f 1 (w 1 ) = f 2 (w 1 ,w 2 ,0)}, 
B*( Wl ) = {w 2 G W 2 : (w u w 2 ) G A, /i(iui) ^ / 2 (wi,w 2 ,0)}. 

Notice that for each fixed u>* G Wi, we have W 2 = P(tu*) U B*(wl) U {u> 2 : tu 2 ) G -4 C }, 
and the three sets are disjoint. To simplify the notations, write P{(W\W 2 ) = (witu 2 )} as 
P W1W2 , and P{Wi = wi} as P W1 . Define the following quantity for each w\ G Wi 

£> 1)1Ul = E[rf(X,X 1 )|W^ 1 = Wl ] = P{X ^ /i(^i)|Wi = wi} 
and define the following quantity for each (w\, w 2 ) G A, 

D 2>m t EidiX,^)^, W 2 ) = ( Wl ,w 2 )} = P{X + f 2 { Wl ,w 2 M{Wi,W 2 ) = (w u w 2 )}. 
By the Markov string Y <-> X <-> (Wi, W 2 ), it follows that for each u?i G Wi 

H(X\W X = Wl ) = h(D 1:Wl ), H(Y\W X = Wl ) = h(p * D ltWl ), (84) 

def 

where as before u * v = u(l — v) + f (1 — w). For each (w 1 , w 2 ) G A, we have 

H[X\{W U W 2 = Wl ,w 2 )} = h(D 2tW1W2 ), H\r\{W u W 2 ) = ( Wl ,w 2 )] = %*P> 2 , TO ).(85) 

And furthermore, for each (w 1: w 2 ) G A c , we have 

H[X\(W 1 ,W 2 = w 1 ,w 2 )] = h(P{X f 1 ( Wl )\W 1 = w 1 ,W 2 = w 2 }) 
H[Y\(W 1 ,W 2 ) = (w 1 ,w 2 )} = h(p*P{X^f 1 (w 1 )\W 1 = w 1 ,W 2 = w 2 }). (86) 

We will also need the following quantities 

9 = P{(W U W 2 ) G A}, 9 1 ± P{{W^ W 2 ) G {( Wl , w 2 ) : w 2 G B( Wl )}}. (87) 
Clearly, we have 

H(X)-H(Y\W 1 ) = 1- Pw.HiYlW^w,) 

> l-h(p*D' 1 ) (88) 
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where we have used the concavity of function h(p * u) in the last step and 



d> A p n 



Furthermore we have 



if (y|WiW 2 ) - ff(X|W^W 2 ) 
= P ^, W AH(Y\(W 1 ,W 2 ) = (w 1 ,w 2 ))-H(X\(W 1 ,W 2 ) = ( Wl ,w 2 ))} 

+ £ ^W^OI^i, Wa) = K.^j-^IK^,^) = K,w 2 ))] 
The first term can be bounded as follows 

£ p^^l^FK^!,^) = (wi,w 2 )) -^(xk^x,^) = K,™ 2 ))] 

(wi,t«2)e^4 

= £ £ ^.^[Mp*-^,™^) - h(D 2>WlW2 )] 

u>! w 2 eB(w 1 ) 

+ £ -P^i,^ 2 [^(P * ^2,tuiu;2) ~~ h(D 2>WlW2 )] 

wi w 2 £B*(wi) 

> e 1 G((3) + (e-9 1 )G(a), (89) 
where as before G(u) = h(p * u) — h(u), and 

a = £ S J^-D 2 , W1W2) /? = £ £ ' D 2,mw2, (9°) 

">i -iu 2 e-B*(tyi) 1 t"i ?U26-B(tui) 1 

and the convexity of function G(u) is used in the last step. Next, notice the identity that for 
each wi G W± 

P W1 D 1>W1 = P{X^f 1 (w 1 ),W 1 = w 1 } 

= P{X^f2(w 1 ,w 2 ,0),W 1 =w 1 ,W 2 = w 2 } 

+ P{X = f 2 {w l ,W2,0),W 1 = w 1 ,W 2 =w 2 } 

W2&B* (wi) 

+ P{X^f 1 (w 1 ),W 1 =w 1 ,W 2 = w 2 } 

W2:(wi,W2)&A c 

^ ^ Pu>iw 2 D2,wiw 2 ^ ^ Pw\W2iX D2,wiw 2 ) 



u>2(iB(wi) w 2 eB*(wi) 

+ J2 P m P{X^hM\Wi = w 1 ,W 2 = w 2 }. (91) 

W2:(wi,W2)£A~ 
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It follows that 

E P WUW2 [H(Y\(W 1 ,W 2 ) = (w l ,w 2 )) - H(X\(W 1 ,W 2 ) = ( Wl ,w 2 ))} 

= E E P wl , W2 G[P{X^f 1 (w 1 )\(W 1 ,W 2 ) = (w 1 ,w 2 )}} 

> (1-0)G( 7 ), (92) 
where again the convexity of function G(u) is used, and because of the identity (|9~TT) . we have 

7 = E E ^p{x^/iWim=tBi,^=«i2} 



wi ui2:(wi,«)2)SA c 

lHl-OxP- (0-00(1 -a) 



(93) 



1-0 

It was shown in [8], by a straightforward generalization of the argument in [4], that 

E[d(X,X 2 )\(W h W 2 ) e A c ] >p. (94) 

By the hypothesis 

D' 2 = 9 1 p +(9- 9 1 )a + (1 - 6)p < D 2 
D[ < D\. 

Notice that for each (w 1 ,w 2 ) £ A, D 2)WlW2 < p, because otherwise for this (wi,w 2 ) pair, 
making f 2 (wi, w 2 ,Y) = Y will in fact reduce the distortion, which contradicts with the op- 
timality of the decoding function. Thus < a, (3 < p. Similarly, p < 7 < 1 — p, because 
V ^ P{X 7^ fi{w\)\W\ = wi, W 2 = w 2 } < 1 — p, otherwise we can modify the decoder 
function f 2 to reduce the distortion. Clearly, < Q\ < 9 < 1 by definition. 
Summarizing the bounds, we have shown that 

R HB (D U D 2 ) > min [1 - h(D[ *p) + (l- 0)G(j) + 9 1 G(0) + (9- 9 l )G(a)],(95) 

(a,/3,e,e 1 ,D' 1 )eQ> 

where the minimization is within the following set 

Q< = {(a 7 (3,9,9 1 ,D[):(l-9)p<D' l -(9-9 1 )(l-a)-9 1 p<(l-9)(l-p), 

0<9 1 <9<1, 0<a,(3<p, (9-9 1 )a + 9 i p + (l-9)p<D 2 , D' X <D 1 }. 

This is not yet the function given in Theorem[51 because the minimization given there is within 
the set 

Q = = {(a,(3,9,9 1 ,D' 1 ):(l-9)p<D , 1 -(9-9 1 )(l-a)-9 1 /3<(l-9)(l-p), 

0<9 1 <9<1, 0<a,(3<p, (9 - 9 x )oc + 9tf3 + (1 - 9)p = D 2 , D'^D^. 

This gap will be closed after we give the forward test channel structure. □ 
The upper bound 



29 





wi = 


101 = 1 


x = 


x = 1 


x = 


x = 1 


w 2 = 


0.5^(1-/3) 


O.50i/3 


0.5(0 -0i)(l -a) 


0.5(0 -6 x )a 


w 2 = 1 


0.5(0 -6i)a 


O.5(0-0i)(l-a) 


O.50i/? 


O.50i(l-/3) 


w 2 = 2 


0.5(1 -0)(1- 7) 


0.5(1-0)7 


0.5(1-0)7 


0.5(1 -0)(1- 7) 


p(x, Wi) 


0.5(1 - £>i) 


0.5Di 


0.5Di 


0.5(1 -Di) 



Table 1: Joint distribution p(x, w\, w 2 ) and the marginal p(x, Wi). 



We explicitly construct the random variables with joint pmf given in Tabled It is straight- 
forward to verify that it is a valid pmf, given the conditions in the definition of (a, /3, 0, 0i). 
Furthermore, the rate I(X; Wi) + I(X; W 2 \W 1 Y) is exactly S Dl (a, (3, 0, 6 X ). The decoding 
functions are /i(Wi) = Wi and f 2 (W 1 , W 2 , Y) = W 2 if W 2 ^ 2, otherwise f 2 (W u W 2 , Y) = 
Y. This establishes the upper bound. 

Now we show that the gap aforementioned in the proof of the lower bound can be closed. 
Suppose that the parameters that minimize the right hand side of (l95t are (a, ft, 0, 0i, D[), and 
furthermore D[ < D\. The set of random variables W[, W 2 can be constructed as given in 
Table[l]with D[ replacing D\. By the lower bound established above, we have 

Rhb(D 1 , D 2 ) > I(X; W[) + (X; W' 2 \W[Y). (96) 

Consider a random variable W" = W[ © N, where N is a Bernoulli random variable inde- 
pendent of everything else with P(N = 1) = 77 such that 77 * D[ = D\ = D", which is valid 
since max{L>i, D[} < \. Let W' 2 ' = [W{, W£), and we have (W{', W![) E P(D h D 2 ). Clearly, 
W" W[ <r-> X «-> Y, and W" W[ <-> W 2 . Thus by the rate distortion theorem for this 
problem 

/(X; Wi) + /(X; WZ\W{'Y) > R HB (D 1 , D 2 ). (97) 

Notice that 

I(X;W[) + I(X;W£\W{Y) 

J(X; W[, W[') + I(X; W'{, W^W' X Y) 
I(X; W{') + I(X; W{\W") + I(X; W' 2 \W[W'{Y) 

J(X; W") + I(X; W[\W") + J(X; W[W' 2 \W'{Y) - I(X; W[\W"Y) 

I(X; W") + I(X; W[W' 2 \W'{Y) + I(Y; W[\W'{) 
I(X; Wi') + I(X; WiW^Wi'Y) + h(p * D'{) - h(p * D[) 
I(X; Wi') + I(X; WiW 2 \W"Y) 

where (a) and (c) follow because of the Markov chain W{' <-> W{ <-> X <-> F, (6) is by 
applying chain rule to the last term in the previous line, and the last step is because p < 0.5 
and D[ < Di = D" < 0.5. However, this implies 

J(X; Wi') + I(X; WiW' 2 \W'(Y) > R HB (D h D 2 ) 

> /(X; Wi) + (X; W' 2 \W'^Y) > /(X; Wi') + J(X; W'JV' 2 \W'lY) 



(a) 

(J 
> 
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which is a contradiction. Thus we conclude that the minimum must be achieved with D[ = D\. 

Next we show that the constraint [9 — 6\)ot + 9if3 + (1 — 9)p < D 2 can be met with equality 
without loss of optimality; i.e., 

rnin [i-h(D' 1 *p) + (l-6)G( 1 ) + e 1 G(P) + (6-0i)G(a)] 

(a,/3,6»,6»i,D;)GS> 

min [l-h(D l 1 * P ) + (l-9)Gh) + 9 1 G({3) + (9-9 l )G(a)}. (98) 

(a,/3,6,6i,D' 1 )eQ.= 

Suppose otherwise, such that the parameters (a, (3, 9, 6\, D\) minimizing the right hand side of 
Eqn. <|93> satisfy (6 - Oi)a + 6\/3 + (1 - 9)p < D 2 , and any parameters (a, (3, 9, 6 1 , Di) e Q= 
will result in a strict increase in the rate. If 9 = 0, the contradiction is trivial: either a or (3 
can be increase to reduce the rate. When 9 < 1, but a, [3 < p, 7 G (p, 0.5) U (0.5, 1 — p) and 
< 6>i < 9, it is also trivial to construct such parameters, by disturbing (incrementally) a or 
(3. Thus the only remaining cases are the follows, and we will ignore the term 1 — h(p * Di) in 
the sequel: 

• P < 7 < 0.5, a = p and < 9. In this case, notice that 

{l-9)G{ 1 ) + 9 l G{(3) + {9-9 l )G{a) = (1 - 9)G(j) + 9 X G((3) + (9 - 9^(1 - a) 

> (l-9 1 )G( D l~ 6 ( ^ ) + 9 1 GW), 

where the inequality is due to the strict convexity of G(u). Furthermore, notice that 
V ^ — 1 — P> smce it is a convex combination of 7 and 1 — p. However, this 

implies the set of parameters (p, /3,9i,9i) strictly improves over the minimum, which is 
a contradiction. 

• P < 7 < 0.5 and 9 = 9 1 . Let e be a small positive quantity to be specified later. First 
notice the condition implies that (3 < p for any D 2 < p, then 

(1 - 9)G(j) + 9G((3) = (1 - 9 - e)G(j) + eG(j) + 9G((3) 

> { i-e-e)G{ 1 ) + {9 + e)G{(3'), 

where the inequality is due to the strictly convexity of G(u) and 

. e(A-gfl 

P (e + 9)(l-9) e + fl V } 

Notice further that 

7 = gp|g = fl.-(« + ^ (100) 

1 — 6' 1 — 6^ — e 

thus by choosing a sufficient small e > 0, the following two conditions can be satisfied 
simultaneously, 

(9 + e)P' + (l-9-e)p = 9(3 + (l-9-e)p + e(-f-p) < D 2 , (3'<p. (101) 

This implies that (p, (3\ 9 + e, 9 + e) strictly improves over the minimum, which is a 
contradiction. 
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• 0.5<7<1— p, (3 = p and 6 X > 0. The contradiction is similarly constructed as the 
first case. 

• 0.5 < 7 < 1 — p and 9 X = 0. This is an impossible case, since a < p and D x < 0.5. 

• A = 0.5 and < 9 X < 9, < a, (3 < p. In this case, perturbing a, (3 together incremen- 
tally gives a contradiction. 

Thus there is no loss of optimality by replacing the optimization set <2< with Q=, and this 
completes the proof. 

□ 

E.2 Proof of Corollary Q] 

Notice that for any (a, (3, 9, 9 X ), 

S Dl (a,/3,6,6 x ) > l-h(D 1 *p) + (9-9 1 )G(a) + 9 1 G(p) 
> 1 - h(D x * p) + 9G{(3') 

where (3' = ^~ ei ^ +6>l/3 , and the first inequality is due to the non-negativity of function G(u), 
while the second inequality is due to its convexity. Furthermore, the constraint is satisfied with 

D 2 = {9- 9 1 )a + 0i/3 + (1 - 9)p = 9(3' + (1 - 9)p. 

Let (a, (3, 9, 9i) be the set of parameters achieving the minimum. Then by Theorem[U we have 

Rhb(D u D 2 ) = S Dl (a, (3, 9, 9 1 ) > [1 - h(D 1 * p) + 9G{(3% 

where D 2 = 9(3' + (1 — 9)p. Moreover < (3' < p, because both a and (3 are in this range, 
and (3' is the convex combination of them. Thus 

Rhb(D 1 ,D 2 ) > 1 - h{D x *p) + min [9G((3% 

D 2 =ep'+{i-e) P 

with the minimization range < (3' < p and < 9 < 1 . Comparing it with the rate distortion 
function R* X ^ Y (D) of (l35t establishes the claim. □ 

E.3 Proof of Corollary H 

In [4], it was proved that when D 2 < d c ,R* xlY (D 2 ) = G(D 2 ), and by CorollaryE R HB (D 1 , D 2 ) > 
1 - h{D x *p) + G(D 2 ) for this case. To show R H b(Di, D 2 )<1- h{D x *p) + G(D 2 ), consider 
the following test channel. Let W 2 be the output of a binary symmetric channel (BSC) with 
crossover probability D 2 and input X, let W x be the (cascade) output of a BSC with crossover 
probability i] with input W 2 , such that r/* D 2 = D\\ such an r] always exists because D 2 < D x . 
It can then be easily verified that 

I(X; W x ) + /(X; W 2 \W X , Y) = 1 - h(D x * p) + G{D 2 ) (102) 

and the distortion is D x and D 2 by taking f x {W x ) = W x and f 2 (W x , W 2 , Y) = W 2 . The rate 
distortion theorem for this problem implies that Rhb(Di, D2) < 1 — h(D x *p) +G(D 2 ), which 
completes the proof. □ 
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